Reflections and a Proposal for a Query and Reporting Language for Richly Annotated Multiparallel Corpora

نویسنده

  • Simon Clematide
چکیده

Large and open multiparallel corpora are a valuable resource for contrastive corpus linguists if the data is annotated and stored in a way that allows precise and flexible ad hoc searches. A linguistic query language should also support computational linguists in automated multilingual data mining. We review a broad range of approaches for linguistic query and reporting languages according to usability criteria such as expressibility, expressiveness, and efficiency. We propose an architecture that tries to strike the right balance to suit practical purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora

The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence and word alignment across multiple languages. Additionally, linguistic annotation such as part-of-speech tagging, lemmatisation, chunking, and dependency parsin...

متن کامل

Implementing Linguistic Query Languages Using LoToS

A linguistic database is a collection of texts where sentences and words are annotated with linguistic information, such as part of speech, morphology, and syntactic sentence structure. While early linguistic databases focused on word annotations, and later also on parse-trees of sentences (so-called treebanks), the recent years have seen a growing interest in richly annotated corpora of histor...

متن کامل

ANNIS3: A new architecture for generic corpus query and visualization

This paper is concerned with the data structures, properties of query languages and visualization facilities required for the generic representation of richly annotated, heterogeneous linguistic corpora. We propose that above and beyond a general graph based data-model, which is becoming increasingly popular in many complex annotation formats, a well-defined concept of multiple, potentially con...

متن کامل

Storing and Querying Historical Texts in a Relational Database

This paper describes an approach for storing and querying a large corpus of linguistically annotated historical texts in a relational database management system. Texts in such a corpus have a complex structure consisting of multiple text layers that are richly annotated and aligned to each other. Modeling and managing such corpora poses various challenges not present in simpler text collections...

متن کامل

Finite Structure Query: A Tool for Querying Syntactically Annotated Corpora

Finite structure query (fsq for short) is a tool for querying syntactically annotated corpora. fsq employs a query language of high expressive power, namely full first order logic. It can be used to query arbitrary finite structures, not just trees.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015